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We give a new method for generating perfectly random samples from the stationary 
distribution of a Markov chain. The method is related to coupling from the past 
^S ! (CFTP), but only runs the Markov chain forwards in time, and never restarts it at 

previous times in the past. The method is also related to an idea known as PASTA 
(Poisson arrivals see time averages) in the operations research literature. Because 
the new algorithm can be run using a read-once stream of randomness, we call it 
read-once CFTP. The memory and time requirements of read-once CFTP are on par 
with the requirements of the usual form of CFTP, and for a variety of applications 
the requirements may be noticeably less. Some perfect sampling algorithms for point 
processes are based on an extension of CFTP known as coupling into and from the 
past; for completeness, we give a read-once version of coupling into and from the past, 
but it remains unpractical. For these point process applications, we give an alternative 
i.^ , coupling method with which read-once CFTP may be efficiently used. 

o 

O i 1. Introduction 

O 



One of the mantras of "coupling from the past" (CFTP), a class of algorithms for generating 
perfectly random samples from a Markov chain, is that one needs to be prepared to re-use 

^ I old random coins. This would appear to rule out any possibility of running CFTP with a 

read-once stream of random coins, such as a Geiger counter, thermal noise (used by the Intel 
hardware random number generator ( |Jun and Kocher, 1999|) ), or other physical source of 
truly random coins, short of storing all the random values somewhere. Nonetheless we give 
here a simple variation on CFTP, whose time and memory usage is competitive with the 

rS ' current prevalent version of CFTP, but which outputs perfectly random samples using just 

c^ ' a read-once source of randomness. Even with re-readable sources of pseudorandom coins, 

which come with seeds that allow one to regenerate previously generated values, there can 
be advantages to using the read-once version of CFTP, particularly if many independent 
samples are desired. 

We give a more detailed review of CFTP in a later section, but for now we state that it is a 
method for generating random samples from the steady state distribution of a Markov chain, 
when the Markov chain is implemented by repeatedly applying a randomizing operation to 
a point in the state space. The method is based on the principle that a Markov chain 
that has already been running for an infinitely long time has already reached its stationary 
distribution. To obtain a random sample, CFTP "figures out" what state the Markov chain 
is in at a given time, by looking at a finite but unbounded number of randomizing operations 
used prior to that time. Usually the "figuring out" part requires cleverness on the part of 
the algorithm designer, and different techniques are used in different applications. Rather 



than extend the range of apphcations to which CFTP may be apphed, our purpose here is 
to give a variation on the method which may be used with most of these apphcations. 

Main Result: Every version of CFTP for which 

1. i.i.d. randomizing operations are used to do the updates, 

2. the algorithm produces the random sample, in its entirety and with full precision, after 
composing finitely many random maps, and 

3. the random maps can he evaluated at a given starting state without affecting coalescence 
detection, 

can he done with a read-once stream of random coins. Furthermore, the expected running 
time and memory usage are never worse hy more than a (small) constant factor. 

The three conditions of the main resuh are satisfied by most algorithms that one would 
normally think of as CFTP, with just a couple of exceptions. The exceptions to condition 
1 are a few algorithms, which might be more properly described as "coupling into and 
from the past," which use a separate Markov chain running backwards in time, rather than 
an i.i.d. process, to generate the randomizing operations used to do the updates. The 
principal exceptions to condition 2 are algorithms given by |van den Berg and Steif (1998| ), and 
[Haggstrom and Steif (199^ ) for infinite spin systems, where there is no hope of outputting 
a sample in finite time, but where there is a "virtual infinite sample," any part of which 
can be revealed to someone who asks to see it. For these algorithms, a re- readable source of 
randomness appears to still be required. |M0ller (1999| ) gave an algorithm for the autogamma 



distribution which outputs not the sample but a neighborhood containing the sample after 
composing finitely many maps, so it too does not satisfy condition 2. But [Wilson (1999| ) 



gave a modification which is not only faster but also satisfies the three conditions of the 
main result, thereby allowing us to use read-once CFTP. At present there are no exceptions 
to condition 3. 

One advantage of read-once CFTP over the prevalent version of CFTP is that one does not 
need to keep track of pseudorandom number generator seeds. As CFTP is typically currently 
used, for even one sample the program keeps track of seeds for a number of independent 
streams of pseudorandom numbers. When many independent samples are desired, many 
independent streams are required. This independence requirement could be a problem if 
one is using e.g. the pseudorandom number generator that comes standard with Unix (BSD 
or libc5), where even if one believes that the stream of numbers produced from any given 
seed is adequately random, the streams produced using different seeds are quite decidedly 
not independent. (The streams started by different seeds are correlated to an extent that is 
quite shocking to someone expecting independence.) Using read-once CFTP for even a large 
number of samples, only one good-quality stream of pseudorandom numbers is needed. 

Read-once CFTP is also advantageous in situations where storage is currently used for 
each time step, even when there is a re-readable source of random coins. In some cases it is 
not feasible to generate an entire random map at once, so the algorithm instead maintains 
partial information about each random map, which is then updated each time the random 



maps are revisited; examples are given by Lund and Wilson (1997 ) and Mira, M0ller, and 



Roberts (1998| ). Read-once CFTP never revisits a random map it has seen before, so it is 



not necessary to either store this partial information, or, more importantly, to write code to 
update this partial information. Many point process algorithms (see e.g. [Kendall and M0ller 



|(1999|) ) also store information for each time step. Although these point process algorithms 
often do not satisfy condition 1 of the main result, in § ^ we are still able to use read-once 
CFTP to sample from these point processes, thereby reducing the storage requirements. 

The read-once version of CFTP given here will satisfy additional pleasant run-time prop- 
erties not mentioned in the claim. In some cases, such as applications of CFTP to Bayesian 
inference, the read-once version of CFTP may be noticeably faster. Other run-time proper- 
ties would be appreciated by someone concerned about the sociological phenomenon of an 
impatient user introducing bias by aborting and restarting the algorithm. For instance, the 
distribution of running times will have exponentially decaying tails, assuming that the effort 
to apply a single random map does not itself have a run time distribution with fat tails. 
The usual version of CFTP also has this property, but the read-once version of CFTP has 
another favorable run time characteristic not shared by the usual version of CFTP. 

As [Fill (19981) has pointed out, the usual version of CFTP will on occasion enter a 



state where the expected additional running time before outputing an answer can be large. 
When this happens, the user may be tempted to abort and start over. In contrast, under 
conventional assumptions (explained in § |^) about the underlying random map procedure, 
the read-once version of CFTP given below does not have this property. For it the expected 
time to completion is never larger than it would be if the user aborted and started over. 
A stochastic domination version of this statement also holds, so it should be the case that 
the user is never tempted to abort. Thus we could say that the algorithm is "temptation 
free." Despite the algorithm being temptation-free, the user with a specific deadline (and 
re-readable randomness) may still prefer to use an "interruptible" algorithm such as Fill's 
algorithm ( [j^'ill, 1998[ ). 

In the remainder of this article we review CFTP in § ^, and then give two derivations of 
read-once CFTP, the first one (in § |^) starts from CFTP, and the second (in § ^) starts from 
another idea known as PASTA (Poisson arrivals see time averages). In § ^ we characterize the 
performance of read-once CFTP. Many interesting applications of CFTP are to unbounded 
state spaces, and in § ^ we give a variation of a subroutine of read-once CFTP that makes 
it easier to use in these contexts. In § |^ we review the coupling into and from the past 
(CIAFTP) algorithms, which do not satisfy the first condition (independence of random 
maps) required by read-once CFTP. We give a read-once version of CIAFTP in § p, but it is 
not very satisfying. As the principal applications of CIAFTP are point processes, we explain 
in § y how to sample from these point processes using instead the version of read-once CFTP 
in § p. 

2. Background on coupling from the past 

Before describing the read-once version of CFTP, we first review the usual version of CFTP. 
More expanded explanations are given by [Propp and Wilson (1996[ ) , [I'm (199"^ ) , Pr'ropp an^ 
Wilson (1998a[ ), and [Wilson (1999| ). 



CFTP requires a randomizing operation which preserves the probability distribution vr 
from which we wish to sample. There are many maps from the state space to itself; the 
randomizing operation effectively picks a random such map according to some distribution. 



Let us consider a toy example: suppose vr is the uniform distribution on the state space of 
permutations on n letters. One possible randomizing operation would pick a random number 
i between 1 and n — 1, and then flip a coin c to decide whether to rearrange the items in 
positions i and z + 1 so that they are in sorted order or in reverse-sorted order. If we per- 
form this operation on a uniformly random permutation, the result will also be a uniformly 
random permutation, so we say that the randomizing operation preserves the uniform dis- 
tribution vr. The (random) pair (z, c) may be used to update any given permutation, so it 
represents a (random) function or map from the state space to itself. We obtain a Markov 
chain by applying the randomizing operation over and over again to a given state; different 
randomizing operations may give rise to the same Markov chain. 

We assume that the randomizing operation is given to us as the procedure RandomMap(). 
Each time that RandomMap() is called, it returns some representation of a random map 
(such as a random pair (z, c) in the above example), and the random map is independent 
of all random maps previously generated. Let /_t denote the map returned the t^^ time 
RandomMapO is called, which we view as the randomizing operation that occured at time 
—t. If a Markov chain is in state x at time — t, then at time — t + 1 it will be in state f^t{x)- 
Thus we view the randomizing operations as having been started infinitely far in the past, 
and they run up until time 0. Let F_t denote the composition of /_i o ■ ■ ■ o /_j, i.e. the net 
effect of the t randomizing operations prior to time 0. If we somehow obtained a random 
state X distributed according to vr, then since the randomizing operation preserves tt, F_t{x) 
will also be distributed according to vr. 

It is easy to see that the event that there is some t such that F_t maps the state space 
to one value, occurs with probability either or 1. Usually it is not hard to ensure that this 
probability is positive, so let us assume that the probability is 1. If F_t maps the state space 
to a single value, then for any t' > t, F_ti will also map the state space to this same value. So 
with probability 1, all but finitely many of the random variables F_i(x), F_2(x), F_^{x), . . . 
will take the same. Since this common value is independent of x, for convenience we denote 
it by F_oo(*). Since the random variables F^i{x),F^2{,x),F_^{x), . . . are each distributed 
according to vr, and with probability 1 they converge to the random variable F_^{*), this 
random variable must also be distributed according to vr. 

CFTP, which is expressed abstractly as in Figure ^ works by determining and then 
outputting the random variable F_oo(*)- Either CFTP runs forever with probability 1, or 
else with probability 1 it successfully determines the state F_oo(*) of the Markov chain at 
time 0, which is distributed exactly according to the desired distribution vr. 

F := (identity map) 

while not Singleton(ImageOf(F)) 

F := F o RandomMapO 
return Element ContainedIn(ImageOf(F)) 

Figure 1: High level pseudocode for coupling from the past. Since random maps are 
composed going backwards in time, the algorithm might be more properly called coupling 
into the past. It has been observed many times that reversing the order of composition in 
the third line would result in a biased algorithm. 



The fact that composing maps backwards in time gives information about the state at 
time 0, which is then a perfectly random sample, appears to have been first noted and ex- 
ploited by [Letac (1986| ). piaconis and Freedman (1999 ) give a survey of this and related 



work. The main use for which this principle was used was to prove the existence of stationary 
distributions of Markov chains. Algorithms based on this principle for sampling from non- 
trivial distributions weren't developed until many years later. The basic problem was a lack 
of effective means of determining when to stop composing the maps. The first (nontrivial) 
algorithms based on the "state at time zero is random" principle was a random spanning 
tree algorithm due to [Broder (1989|) and |Aldous (1990|) , and the dead-leaves process (see 



( [Jeulin, 1997|) ). The tree algorithm is actually more closely related to "coupling into and 

from the past." We say more about this extension of CFTP and these two algorithms in § ^ 

The next development was "monotone-CFTP" ( |Propp and Wilson, 1996D , which is a 



particularly efficient algorithm that can be used when the state space has a partial order ^ 
that is preserved by the randomizing operations {ii x ^ y then f-t{x) :< f-t{y)), and there 
is a biggest state 1 and smallest state 0. These conditions are somewhat restrictive, but a 
surprisingly wide variety of Markov chains of practical interest satisfy these conditions; see 
e.g. the examples given by Propp and Wilson (199C^ ), [Luby, Randall, and Sinclair (199"5| ), 



Felsner and Wernisch (1997 ), Paggstrom, van Lieshout, and M0ller (1999 ), |Lund and Wilson] 



[T997I) , Ivan den Berg and Steif (T998|) , |Nelander (19"98D , |Mira, M0ller, and Roberts (1998| ) 



and [Muri, Chauveau, and Cellier (1998|) . The algorithm in Figure |T| computes compositions 
in the order 

(■ ■ • ((/-I O f-2) O /-s) ■ ■ ■ O f-T+l) O f~T- 

For monotone-CFTP (and most subsequent versions of CFTP), it is much easier to perform 
the composition in the order 

/-I o {f-2 o (/-3 ■ ■ ■ o if-T+i o f-r) ■■■))• 

The reason is that in the end we only need the image of the final composition, and if we 
compose the maps in the second order, then we only need to compute the images of the 
intermediate compositions, rather than having to compute the entire map. (We explain 
below how these images are computed — what's important here is that this computation is 
easy in the monotone setting.) In contrast with the first order of compositions, where we 
compose maps going back in time, doing compositions in the second order requires us to 
pick a starting value — T in some fashion, and compose maps going forwards in time back 
to the present. If the composition is not coalescent (i.e. the image is not a singleton), then 
we pick another starting value even further back in the past. A reasonable choice of starting 
times are times of the form —2^, and the resulting binary-backoff version of CFTP is shown 
in Figure 0. 

Note that the algorithm resets its source of randomness in a manner that ensures that 
for each t, the random map f-t has the same value each time it is used in a composition. 
A priori we should be extremely suspicious of any proposal to pick fresh values for /_t each 
time it is refered to, since then the binary-backoff CFTP in Figure |^ would not properly 
emulate the algorithm in Figure ^ In fact it is a bad idea, and results in a biased algorithm. 
For this reason it is emphasized that the same coins need to be re-used each time that /_t 
is generated and refered to, and it becomes unclear how to proceed with a read-once source 
of randomness. 



BinaryBackoffCFTP (NuinberOfSamples) 
for i := 1 to Number OfSamples { 
T:= 1 
repeat { 

Set := (state space) 
for t := T downto 1 
if t is a power of 2 

Set RandomSeed (seed [i , log2 (t) ] ) 
Apply RandomMap(Set) 
T := 2*T 
} until Singleton (Set) 
output ElementContainedln(Set) 

Figure 2: Pseudocode for the binary-backoff implementation of CFTP, iterated some number 
of times. The variable Set represents the image of the composition f_j' o ■ • • o /_j, or more 
generally a superset of the image. The representation of Set is seldom a naive listing of 
states. Note that a number of independent random number streams are used, and some 
streams of randomness are read multiple times. 

Remark: The spanning tree algorithm and the dead-leaves process are unusual in that 
they compose their maps using the first order, i.e. back into the past rather than from the 
past. As pointed out by Kendall, these algorithms therefore already run with a read-once 
source of randomness. Nearly every other CFTP-type algorithm composes maps forward 
in time from the past, and therefore requires a different method of running with read-once 
source of randomness. 

We briefly return to monotone-CFTP and explain how it computes the images of the 
compositions of random maps when they are composed going forwards in time. Technically 
the precise image of the map is not computed, but rather a superset of the image is computed. 
The superset at time —t is represented in the computer by two bounding states, £_j and U-t, 
and the superset is the interval {x : £_t ^ x ^ «-*}• The bounding values £_t and u^t 
are set to the minimum and maximum states respectively, so that trivially the resulting 
interval is a superset of the image of the initial composite map (indeed of any map). The 
bounding states are updated by the rules U-t+i := f-tiu^t) and l^t+i '■= f~t{(--t)- Since 
the random maps respect the partial order of the state space, by induction we see that the 
interval {x : l-t+i ^ 2; ^ u^t+i} niust be a superset of the image of the map /_t o ■ ■ ■ o f_^. 
We remark that even though the image does not necessarily occupy the entire interval, the 
image is a singleton if and only if the interval is a singleton (i.e. iff It = Ut). 

After the success of monotone-CFTP, there has been a good deal of research on finding 



more classes of applications to which CFTP may be efficiently applied; see e.g. |Propp and 



Wilson (1998BD , [Kendall (19981) , [Kendall (199^ ), [Haggstrom and Nelander (1998[ ), [EuEy 



and Vigoda (1997[) , [Murdoch and Creen (19981) , [Kendall and M0ller (1999| ), [M0ller (1999| ), 



Haggstrom and Nelander (1999[) , [Green and Murdoch (199^ , [Kendall and 'I'honnes (1998[) 



Huber (1998b| ), [Huber (1998a| ), and |Haggstrom and Steif (1998| ). In this subsequent work, 



researchers have studied state spaces that don't have a convenient partial order preserved 
by the random maps, and found other clever mechanisms for effectively representing and 
updating a superset of the image of the composition of the random maps. There is a tradeoff 
in the choice of representation: maintaining the exact image or a very detailed superset of it 
may take a lot of computer effort, while if too course a superset is maintained, coalescence 
may not be readily detected. 

We remark that there are also a number of perfect sampling algorithms based on "Fill's 
algorithm" ( Fill, 1998|) rather than CFTP (see e.g. [Fill (1998|) , [Thonnes (19"99D , |M0ller and 



Schladitz (1998| ), and [Fill, Machida, Murdoch, and Rosenthal (1999| )), and that there are 



Markov chain-based perfect sampling algorithms based on neither method (see e.g. [Asmusse"n 



Glynn, and Thorisson (1992| ), jAldous (1995[ ), [Lovasz and Winkler (1995[ ), and Pr'ropp anB 



Wilson (1998"E[) ). 



3. Read-once CFTP 

In this section we explain the read-once randomness version of CFTP, for which pseudocode 
is given in Figure |^. Read-once CFTP may be viewed as a retroactive stopping rule. It 
applies random maps going forwards in time, and then at some point it decides to stop, and 
then returns not the current state, but some previous state. 

A key part of read-once CFTP is a composite random map procedure, which uses the 
ApplyRandomMap procedure as a subroutine. From the standpoint of read-once CFTP, it 
appears as if the composite map procedure generates a random map, makes some effort to 
determine whether or not the map is coalescent (i.e. whether or not it maps all states to 
one state), and then evaluates the map at a given input state to obtain an output state. 
The composite random map preserves the desired probability distribution, in that if the 
input state is distributed according to the desired distribution, then so is the output state. 
If the procedure determines (by examining the representation of the superset of the image 
of the map) that the random map is coalescent, then we say that the map is "officially 
coalescent." Otherwise the map is not officially coalescent, and it may or may not map all 
states to one state. It is important that the choice of input state at which the random map 
is evaluated does not affect whether or not the composite map procedure detects coalescence 
(since otherwise it would not appear as if the procedure tested for colescence and then 
evaluated the random map at the input state). For efficiency reasons, we design the procedure 
so that it produces an officially coalescent random map with probability p > 1/2. We assume 
that subsequent invocations of the composite map procedure are independent. Later we 
explain how to implement a composite random map that meets these requirements, but first 
we see how to use it for read-once CFTP. 

Suppose the composite map procedure gave us the entire random map, rather than just 
evaluating it at one state. Then we could do CFTP, composing new composite maps going 
back in time. Let /_t be the first (closest to time 0, smallest T) composite map that is 
officially coalescent. T is a geometric random variable with mean 1/p < 2. /_t is a random 
composite map conditioned to be officially coalescent, and is furthermore independent of 
T. Let S be the state in the image of /_t. CFTP would then apply the composite maps 
f-T+i, ■ ■ ■ , /-I to S, and return the result. The composite maps /_t+i, • • • , /-i are i.i.d. 
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random composite maps conditioned not to be officially coalescent, and are independent of 
S. So we could equivalently generate T — 1 fresh random composite maps conditioned not to 
be officially coallescent, and apply them to S. Furthermore, there is no need to count T. We 
can simply update S using fresh composite random maps, until one of the maps is officially 
coalescent, and return the value of S prior to the last composite map. (This is where we use 
the i.i.d. condition.) 

Thus to generate a random sample, we make random composite maps until we see two 
that are officially coalescent, and compose those maps between the ffist coalescent map 
(inclusive) and the second coalescent map (exclusive). Since the second of the officially 
coalescent composite maps is used only as a stopping criterion, and is not itself included in 
the composition of maps which results in the random sample, this second coalescent map 
is independent of the returned random sample, and so may be used in the generation of a 
subsequent independent random sample. If k random samples are desired, the last sample 
is returned upon the generation of the {k + 1)^* officially coalescent composite map. 

ReadOnceCFTP (NumberOfSamples) 
Initialize() 

for i := 1 to NumberOfSamples 
output NextSample() 

Initialize () 

State := (arbitrary state) 
repeat 

ApplyCompositeMap(State,CoalescenceFlag) 
until CoalescenceFlag 

NextSample () 
repeat 

OldState := State 

ApplyCompositeMap(State, CoalescenceFlag) 
until CoalescenceFlag 
return OldState 

Figure 3: Pseudocode for the read-once version of CFTP, iterated some number of times. 
The Initialize() routine generates a random map conditioned to be officially coalescent. The 
NextSample() routine uses this map, and produces a random sample from the desired distri- 
bution, as well as an independent random map conditioned to be officially coalescent, to be 
used in the subsequent call to NextSample(). The ApplyCompositeMap() procedure over- 
writes the input state with the output state, and stores in CoalescenceFlag a Boolean value 
reporting whether or not the random composite map is officially coalescent. This version of 
CFTP uses only one pseudorandom stream, and random values never need to be re-read. 



If there were some standard notation for reasoning about algorithms that produce random 
outputs, we might be able to re-express the previous discussion more symbolically in a 
manner such as the following. Here we have assumed that there is a positive probability that 
a random (composite) map is coalescent, and we have let /(*) denote the unique element in 
the image of a coalescent map /. 



X := draw from vr 

/ := officially coalescent random map 

output X, /(*) 



V 



V 



T :=0 

F := (identity map) 
while F is not coalescent 

V T:=T + 1 
f^T '■= RandomMapO 
F ■.= Fo f_T 

f := officially coalescent random map 
output F(*),/(*) 

T :=0 
repeat 

T ■=T + 1 

f-T '■= RandomMapO 
until f^T is officially coalescent 
/ := officially coalescent random map 
output /_i(- ■ ■ /-t+i(/-t(*))), /(*) 

T :=0 

repeat 

T ■=T + 1 

f^T '■= RandomMapO 

until /_T is officially coalescent 

/ := officially coalescent random map 

output /_T+l(- ■ ■ /-l(/(*))), /-t(*) 

/ := officially coalescent random map 

State := /(*) 

repeat 

V OldState := State 
/ := RandomMapO 
State := /(State: Exp ) 

until / is officially coalescent 
output OldState, State 

In the composite map procedure given in Figure ^, we independently and in parallel 
update two subsets of the state space, each representing (a superset of) the image of a 
random map. Initially the two maps are the identity map, so that the two subsets are 
initially the whole state space. At each step we update the first set with ApplyRandomMap, 
and update the second set similarly but with an independent random map. We keep doing 
these parallel updates until the second map is officially coalescent. The number of times 
that the first subset was updated is independent of the mappings used to do its updates. 
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Therefore the first mapping preserves the desired probabihty distribution on the state space. 
Furthermore, since it is with probabihty at least 1/2 that the first subset becomes a singleton 
no later than the second subset, it is with probability at least 1/2 that the first mapping is 
officially coalescent. (This is where we used condition 2, coalescence in finite time.) 

ApplyCompositeMap (State, CoalescenceFlag) 

Setl := (state space) 

Set2 := (state space) 

while not Singleton(Set2) 

ApplyRandomMap(Setl, State) /* apply same random map to Setl and State */ 
ApplyRandomMap(Set2) /* but apply independent random map to Set2 */ 

CoalescenceFlag := Singleton(Setl) 

Figure 4: Interleaved version of ApplyCompositeMap(). ApplyRandomMap() takes an 
optional argument State that is updated according to the generated random map. 



4. Read-once CFTP and PASTA 

We obtained this read-once version of CFTP by starting with the usual version of CFTP 
and modifying it. It would also have been possible to start with what is known as PASTA in 
the queuing theory and operations research literature, make suitable changes, and arrive at 
read-once CFTP. In this section we explain PASTA and this alternate derivation of read-once 
CFTP. 

PASTA is a statement about a stochastic process evolving in time, and discrete events 
which affect the stochastic process and occur at times given by a Poisson process. PASTA 
stands for "Poisson arrivals see time averages," which means that the steady-state distribu- 
tion of the stochastic process averaged over all times is identical to the steady-state distri- 
bution of the process sampled at the times just prior to the Poisson events. |Wolff' (1982| ) 



introduced the concept of PASTA, and showed that it holds whenever the stochastic process 
cannot anticipate the future driving events. Since that time there have been many articles 
on applications and generalizations of PASTA, which go by a variety of acronyms, including 
ASTA, ESTA, EATA, EPSTA, CEPSTA, and MUSTA; reviews are given by [Melamed and 



Whitt (19901 ) , [Bremaud, Kannurpatti, and Mazumdar (1992| ) , and [Melamed and Yao (1995| ) . 



A discrete-time version of PASTA would be a statement about a discrete time Markov 
chain, and random events that occur at integer times. If there is an event at a given time, 
then the next state of the Markov chain is drawn according to one transition rule, while if 
there is no event, then a different transition rule is used. Discrete- PASTA would state that 
the distribution of the Markov chain sampled at times just prior to when events occur will 
be identical to the steady-state distribution of the Markov chain. 

In read-once CFTP, an event occurs precisely when a composite map is officially coales- 
cent. Imagine first randomly picking those integers at which events occur. If there is an event 
at a given time, then the Markov chain is updated by a random composite map conditioned 
to be officially coalescent, otherwise it is updated by random composite map conditioned 

10 



not to be officially coalescent. Discrete- PASTA asserts that the if we draw samples from the 
Markov chain at times just prior to when the composite maps are officially coalescent, the 
steady-state distribution of the draws will be the steady-state distribution of the Markov 
chain. PASTA is a statement about the steady-state behavior of the draws; in general the 
ffist several draws taken at positive times will be out of equilibrium. In this particular ap- 
plication of PASTA, since there is a coalescent map between draws, not only are draws after 
the first one easy to compute, but they also must necessarily be independent of one another. 
Since the draws are independent, any particular draw is already in the steady-state distri- 
bution. Read-once CFTP ignores the first draw (since it is neither in equilibrium nor easy 
to compute), and outputs the subsequent draws until the desired number of independent 
perfectly random samples are generated. 

We remark that CFTP and PASTA are not completely unrelated ideas. The "time zero 
sees time averages" principle behind CFTP can be used to derive the "Poisson arrivals see 
time averages" in both the continuous and discrete settings. Perhaps further connections can 
be made between perfect simulation algorithms and the various generalizations of PASTA. 

5. Performance of read-once CFTP 

Expected running time 

Let T/v be the expected number of random maps that we need to compose before the compo- 
sition is officially coalescent. We should not hope to have an algorithm that uses many fewer 
than T/v random maps. The usual binary-backoff CFTP uses between 2T/v and ATj^j random 
maps, with the constant typically being around 2/ log 2 ^ 2.9. The read-once version of 
CFTP will use on average 4{k + l)T/v random maps to generate k samples: The expected 
time to generate a composite map is 2T/v. The time to generate an officially coalescent com- 
posite map is < 4T;v, and we do this k + 1 times to generate k samples. If updating a single 
state is about as expensive as updating a whole set of states, then since each composite map 
does on average T/v state updates in addition to the expected 2TAr set updates, we expect 
to do 6{k + l)T/v updates altogether. 

In some applications of CFTP, particularly on continuous state spaces, applying the first 
several random maps can be enormously more expensive than applying subsequent random 
maps, because initially the updated set is the whole state space, and later it is smaller. 
The binary-backoff version of CFTP does these expensive updates a number of times that 
is logarithmic in T/v, while the read-once version of CFTP does the expensive updates on 
average 4 times (plus another 4 times for the first sample). Therefore, for these applications, 
the read-once version of CFTP could be up to logarithmically faster than the binary-backoff 
version of CFTP. This issue of logarithmic slowdown when the initial updates are expensive 
has come up before. For instance, one algorithm given by [Propp and Wilson (IQQS'EI ) used 



random maps with this sort of run time variability, and used composite maps to avoid the 
logarithmic slowdown. 

Tail distribution of the running time 

Suppose S and T are subsets of the state space, and / denotes a random map. It is clear that 
if S* C T then f{S) C /(T). The ApplyRandomMap procedure when applied to S updates S 
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to a superset of f{S), so conceivably this superset of f{S) may not be contained within the 
corresponding superset of f(T). But in practice every Apply RandoniMap procedure that 
anyone uses respects subset inclusion, meaning that the superset of f{S) is contained within 
the superset of f{T). [Kendall and M0ller (19991) call this property the funnelling property. 

Under the assumption that the ApplyRandomMap procedure satisfies the funnelling 
property, the tail distribution of the running time decays geometrically with decay con- 
stant that is a (universal) constant multiple of T/y. A similar upper bound holds for the 
usual CFTP, but if the underlying Markov chain has a sharp threshold, the tail distribution 
could be even tighter. 

As mentioned above, the new version is "temptation free," whereas the usual version 
occasionally enters states where the user may be tempted to abort and restart (provided that 
the funnelling property holds in that the random maps each take the same amount of time to 
apply). The temptation- free property holds provided that the user does not look at the value 
of the counter, or if the user might do such a thing, the alternative random map procedure in 
Figure |^ can be used instead since it has no counter. Because of the funnelling property, the 
number of iterations before the composite map procedure returns is always stochastically 
dominated by the number of iterations required by a fresh call to Apply CompositeMap. 
Furthermore, the number of calls to apply composite map before the next several samples 
are returned is stochastically dominated by the number of such calls if the user were to 
restart ReadOnceCFTP. Therefore, under these assumptions about ApplyRandomMap, the 
user will never get his or her desired samples more rapidly by interrupting and restarting the 
ReadOnceCFTP procedure. (As mentioned by [Propp and Wilson (1998bD and [Fill (199"8| ), 



for some applications the underlying random maps take a variable amount time to apply. 
For these applications one should not expect ReadOnceCFTP to yield a temptation-free 
sampling algorithm, nor should one expect Fill's algorithm to yield an interruptible sampling 
algorithm.) 

Memory 

The memory required for the binary-backoff version of CFTP is the memory to store a subset, 
plus the memory to store two integers (T and t). The memory required for read-once CFTP 
using the interleaved version of the composite random map procedure is twice the memory 
to store a subset plus the memory to store a state. In principle there is no upper bound on 
the storage requirements for these integers, so in principle the new version of CFTP could 
have smaller memory requirements, but in practice finding memory for these integers is a 
non- issue. 

More significant is the effect of the constant factor increase in memory requirements 
associated with storing two subsets of the state space. Computers typically contain several 
different types of memory, including an LI cache, an L2 cache, and a main memory composed 
of DRAM. The memory close to the processor is fast, expensive, and small, while the main 
memory is slow, cheap, and large (see e.g. ( [Hennessy and Patterson, 1994| , Chapter 7)). Even 
if the simulation still fits within main memory, if less of it fits within the caches, performance 
will degrade. |Yan (1998|) did timing experiments of a wide variety of sizes of Ising model 



simulations, and reported that it was quite noticeable when the next slower type of memory 
started to be used. 
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Therefore, unless the memory requirements are quite small, we recommend instead the 
version of the composite map procedure given in Figure ^j. Rather than updating the two 
sets in parallel, only one set is updated, and then later only the other set is updated. Only 
one subset, a state, and an integer need to be stored. This version of the procedure behaves 
in the same manner as the interleaved version, unless the counter overflows. Even if the 
count were to overflow the integer, while the run time performance could be affected slightly, 
the distribution of the output of the algorithm is still identical to the desired distribution. 

ApplyCompositeMap (State , CoalescenceFlag) 
Set := (state space) 
Count := 
while not Singleton(Set) 

Apply RandomMap(Set) 

Count := Count +1 
Set := (state space) 
while Count>0 

ApplyRandomMap(Set, State) /* apply same random map to Set and State */ 

Count := Count— 1 
CoalescenceFlag := Singleton(Set) 

Figure 5: Memory-efficient version of ApplyCompositeMap(). 



Overall 

In some circumstances, but certainly not all, it may be preferable to use read-once CFTP. 

6. Read-once CFTP and unbounded state spaces 

In this section we describe a small modification to read-once CFTP that makes it easier to 
use with unbounded state spaces. 

For some applications of CFTP to sampling from unbounded state spaces, it is convenient 
to mix two different Markov chains on the same state space. For instance, [Murdoch (1999[) 



describes examples where the natural Markov chain for a state space has favorable mixing 
properties when started from most typical states, but that when started from points "very 
far away" in the tails of the stationary distribution, the time to randomize can get arbitrarily 
large. Such a Markov chain is "non-uniformly ergodic," and it has been observed by a number 
of authors (for example P:'bss and 'Iweedie (1998| ) ) that if we do CFTP using such a Markov 
chain in a straightforward fashion, coalescence takes infinitely long. (The reason is that the 
coupling time upper bounds the worst case mixing time (see e.g. [Aldous (1983| )), which is 
infinite for non-uniformly ergodic Markov chains.) 

To speed up the convergence time to a finite value, [Murdoch (19991 ) suggested mixing the 



natural Markov chain with another Markov chain called the "independence sampler." Details 
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of how to do this can be found in ( [Murdoch, 1999| ); the algorithm for point processes in 



also serves as an illustrative example. We mention here that the effect of the independence 
sampler is to map the entire state space to a bounded region, but otherwise the independence 
sampler has poor convergence properties. If an algorithm occasionally makes moves using 
the independence sampler, but most of the time using the natural Markov chain, then the 
convergence time will be finite, and reasonably fast for the examples considered by Murdoch. 
Murdoch's solution is fairly effective, and upon learning of it, [Wilson (19991) used it in a 
perfect sampling algorithm for the autonormal distribution. 

Murdoch (1999) originally suggested flipping a suitably biased coin at each time step to 
decide whether to update using the independence sampler or the natural Markov chain. But 
for the point process example in § |^, if the independence sampler is applied to frequently, 
it tends to disrupt coalescence detection. For the autonormal algorithm given by |Wilson| 
|(1999|) , independence sampler updates are more expensive than normal updates. But if the 
independence sampler is used too infrequently, the expected run time is guaranteed to be 
large. An alternative is to let the composite map procedure determine on its own what the 
right mixing ratio is. 

Our recommendation for applications using the independence sampler is to let read-once 
CFTP's composite random map procedure do one update from the independence sampler, 
and do subsequent updates using the natural Markov chain. This change is most easily 
made by replacing the lines which initialize Set to the whole state space with lines that 
instead initialize it to the result of the first random map, as shown in Figure |^. For both 
the autonormal and point process applications, using the independence sampler for only the 
first update also helps simphfy the code. 

ApplyCompositeMap (State, CoalescenceFlag) 
Set := ImageOfFirstRandomMapO 
Count := 
while not Singleton(Set) 

ApplyRandomMap (Set ) 

Count := Count +1 
Set := ImageOfFirstRandomMap (State) 
while Count>0 

ApplyRandomMap(Set, State) /* apply same random map to Set and State */ 

Count := Count— 1 
CoalescenceFlag := Singleton(Set) 

Figure 6: Version of composite map procedure suitable for many applications of read-once 
CFTP on unbounded state spaces. The first random map may implement a fundamentally 
different Markov chain with the same stationary distribution, or in some cases the first map 
may be distributed in the same manner as subsequent random maps, but it is convenient to 
treat it differently. Like ApplyRandomMap (), ImageOfFirstRandomMapO takes an optional 
argument State that is updated according to the generated random map. 
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Remark: Since the random maps within the composite map procedure are no longer all 
identically distributed, it is no longer automatic that the tail distribution of the running time 
decays exponentially. For the applications to the autonormal ( [Wilson, 1999 ) and to point 



processes (§ y), and perhaps for other applications, it is elementary to show that the tails 
still decay exponentially. But conceivably for some application the distribution could have 
fat tails, and the expected running time could even be infinite. Under such conditions, [Luby 



Sinclair, and Zuckerman (1993|) recommend restarting (with the independence sampler) after 



runs (of the natural Markov chain) of lengths 1, 1, 2, 1, 1, 2, 4, 1, 1, 2, 1, 1, 2, 4, 8, 1, ... . They 
showed that this sequence is a "universal restart sequence" that can be used to speed up the 
expected time for an event to happen when the running time distribution has fat tails. But 
until fat-tailed coalescence time distributions show up in perfect simulation, we recommend 
that only the first update be made using the independence sampler. 

In other applications of CFTP on unbounded state spaces, it is convenient to implement 
the first random map in a different manner than subsequent random maps, even though 
from a mathematical standpoint the random maps themselves are drawn from the same 
distribution. For instance, [Haggstrom, van Lieshout, and M0ller (199P| ) consider the Widom- 



Rowlinson model on a finite region (such as a unit square) together with a monotone Markov 
chain. With probability 1, a random state will consist of finite number of red points and a 
finite number of blue points from this region. In the natural partial order, we have X ^ Y 
if each red point in configuration X is also a red point of configuration Y, and each blue 
point of configuration Y is also a blue point of configuration X. There is no top state or 
bottom state in this partial order if we restrict our attention to sets with finitely many red 
and blue points, which is inconvenient if we wish to run monotone-CFTP. But there are top 
and bottom states with very geometric interpretations: in the top state each point in the 
region is red and no points are blue, and vice versa for the bottom state. Conveniently, after 
applying one random map, each state in the image of the map is sandwiched in between an 
upper state and a lower state, both of which with probability 1 contain only finitely many 
points. From an implementation standpoint, it is inconvenient (though possible) to create 
a data structure that can represent finite collections of red and blue points, as well as the 
uncountable collections of points that we would need to represent the top state in bottom 
state. Since after one random map we only need to represent finite collections of points, 
it is sensible to choose a simpler data structure that can only represent finite collections of 
points, and treat the first random map as a special case. Another application for which it is 
convenient (though not necessary) to treat the first random map as a special case, on account 
of the state space being very large, is the autogamma sampler given by |M0ller (1999| ). 

To run read-once CFTP on applications for which it is convenient to treat the first random 
map has a special case, as before (Figure ^), we replace the lines initializing Set to the whole 
state space with lines initializing it to the result of the first random map. Since the random 
maps within the composite map are still i.i.d. even though they are implemented differently, 
it is once again automatic that the tail distribution of the running time decays geometrically. 

7. Background on coupling into and from the past 

The CFTP-type algorithms which do not use independent random maps (i.e. don't satisfy 
condition 1 of the main result) are the "coupling into and from the past" (CIAFTP) algo- 
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rithms. These algorithms extend CFTP, and were introduced by [Kendall (1998| ), though he 
did not use this term. There are comparitively few applications of coupling into and from the 
past, as opposed to coupling from the past, but new ones may be developed as more people 
become aware of this worthwhile technique. In the next two sections we adapt CIAFTP 
algorithms to the setting of read-once randomness, and as preparation, we review the basic 
method here. 

For concreteness, we explain coupling into and from the past by means of an example, 
which we then generalize and modify. Recall that in § ^ we mentioned that the dead-leaves 
process and the spanning tree algorithm of [Aldous (19901) and p3roder (1989|) were both 



based upon the "state at time zero is random" principle. In the same way that (as [Kendall 



and Thonnes (1998|) point out) the dead leaves process can be regarded as an early form of 



CFTP, where the random maps are composed going backwards in time into the past rather 
than forwards in time from the past, the Aldous/Broder spanning tree algorithm is an early 
form of coupling into and from the past. The comparison "dead-leaves : CFTP :: spanning 
tree : CIAFTP" is sufficiently compelling that we explain both algorithms together. 

As Broder and Aldous explain in their writeups of the spanning tree algorithm, there are 
two different Markov chains that are run together in a coupled fashion. The target Markov 
chain (that we wish to sample from) is on the set of rooted spanning trees of a given graph. 
The other Markov chain, which we shall call the reference chain, is the simple random walk 
on the given graph. It is assumed that we already know how to sample from the reference 
chain. In the applications of coupling into and from the past given by [Kendall (1998[ ), 
Kendall and M0ller (1999[) , and [Lund and Wilson (1997]) , the chain that we already know 



how to sample from is called the dominating chain, since its values stochastically dominate 
the values of the target chain. In the spanning tree application however, there is no natural 
partial order, or at least none that anyone has found, so the term "dominating chain" is not 
appropriate in general. 

The target chain for rooted spanning trees moves the root to a random neighboring 
vertex, adjoins an edge directed from the old root to the new root, and then removes the 
edge directed out of the new root. It is an interesting exercise to verify that this Markov 
chain preserves the uniform distribution on rooted spanning trees. 

The coupling between the target chain and the reference chain is such that the random 
walk on the graph follows the same trajectory as the root of the spanning tree. The algorithm 
picks a random value for the reference chain, runs it backwards in time, and attempts to 
determine the state of the target chain at time 0. Since the algorithm can use the information 
provided by the reference chain, it is only necessary to test if those spanning trees with a 
specified root, rather than all rooted spanning trees, coalesce to a single spanning tree by 
time 0. The coupling used to determine this is actually quite similar to the coupling used 
in the dead leaves process. We can view the rooted spanning tree as a vector assigning each 
vertex to its parent, and the dead leaves process as a vector (indexed by M^!) describing how 
the dead leaves tesselation looks at each point. As the target chain runs forward, parts of the 
vector are overwritten with new values, and the remaining parts are left untouched. For the 
spanning trees, the vector is changed only at the old root and the new root. This overwriting 
process is inherently Markovian, and is determined by the reference chain. For the dead leaves 
process, the overwriting is determined by an i.i.d. process. It is easy to directly compose 
backwards in time the random maps determined by an overwriting process: initialize with 
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a clean slate, and then only overwrite those portions that have not yet been touched. Keep 
going until every part of the vector has been touched. 

Since simple random walk on an undirected graph is reversible, it is easy to run the 
reference chain backwards in time and use it to determine what the overwriting process 
did in the past. Later [Kandel, Matias, Unger, and Winkler (1996| ) had reason to generate 



random spanning trees from an Eulerian directed graph, i.e. a directed graph where the 
in-degree of any vertex is also its out-degree. While simple random walk on the Eulerian 
graph is no longer reversible, they pointed out that the time reversal of this directed random 
walk is still easy to simulate, so that essentially the same method can be used to generate 
random spanning trees on directed Eulerian graphs. (Other tree algorithms that work for 
more general directed graphs are given by Propp- Wilson.) 

If we abstract away the particulars of the spanning tree algorithm while maintaining the 
overall strategy, we get the "coupling into and into the past" procedure, for which pseudocode 
is given in Figure |^. The algorithm generates a sequence of states . . . , X_3, X_2, X_i, Xq of 
the reference Markov chain and a sequence of random maps . . . , /_3, /_2, /-i of the target 
Markov chain, so that for each time — T in the past, the following two properties hold 

1. X_T is distributed according to vTref. 

2. The pairs (X_t_|_i, /_() (for — T < — t < —1) look as if they were generated by the 
"useful coupling" between the reference Markov chain and random maps of the target 
Markov chain. 

Naturally, for any given value of — T, it would be straightforward to simply start at time 
— T and generate the pairs [X^t+i, f~t) running forwards in time. As with the usual version 
of CFTP, the algorithm attempts to determine the state of the target Markov chain at 
time 0. If there is only one possible value for the state at time 0, then this state is a 
draw from the stationary distribution of the target chain. Of course the algorithm does not 
know ahead of time what time — T to start with, so it must be able to generate the states 
. . . , X_3, X_2, X_i, Xq of the reference chain and the random maps . . . , /_3, /_2, /-i of the 
target chain going backwards in time rather than forwards. 

To generate the states X_i and random maps f-t going backwards in time, first the state 
Xq is drawn from the stationary distribution vTref of the reference Markov chain. Then the 
time-reversal of the reference Markov chain is run, thereby producing the requisite sample 
path ... ,X_3,X_2,X_i,Xo of the reference chain up to time 0. (The reference chain is 
typically reversible, so that running its time-reversal is the same as running the original 
Markov chain.) Implicit in property 2 above is that the conditional distribution of /_< is a 
function of X_t and X_(+i alone. Recall that given the state X_t of the reference chain at 
time —t, the pair {X_t+i, f-t) is supposed to be distributed according to the presepecified 
"useful coupling". Since both X_t and X_t+i are generated before the random map /_i, the 
map /_i must be coupled to them ex post facto, i.e. it must be sampled from a conditional 
distribution. 

Coupling into and from the past (see Figure P) is to coupling into and into the past 
(Figure ^ as binary-backoff coupling from the past algorithm (Figure Q) is to the coupling 
into the past (Figure |I]). The reference Markov chain is still run backwards in time, but 
to test for coalescence, the random maps of the target Markov chain are composed going 
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X := ReferenceChainRandoniState() 

F := (identity map on target chain) 

while not Singleton(IniageOf(F restricted to states compatible with X)) 

X' := ReverseReference Chain (X) 

F := Fo TargetChainRandomMapCoupledExPostFacto(X',X) 

X:=X' 
return ElementContainedIn(ImageOf(F restricted to states compatible with X)) 

Figure 7: High level pseudocode for coupling into and from the past. The state X of 
the reference chain may be used when determining the image of the map F. Since both 
the target chain random maps are composed going backwards in time, in addition to the 
reference chain being run backwards, the algorithm might be more properly called coupling 

into and into the past. 

forwards in time. In this way, as before, the algorithm need only maintain the images of the 
random maps of the target chain as the maps are composed. Observe that the state of the 
reference Markov chain at any given time contains implicit information about the random 
mappings of the target Markov chain at all previous times. This implicit information can be 
taken into account when determining the possible states of the target Markov chain at time 
0. Making use of this implicit information about previous not-yet-generated random maps 
is what distinguishes coupling into and from the past from ordinary CFTP, and enables it 
to generate perfectly random samples using "non-uniformly ergodic" Markov chains, which 
cannot be done using ordinary CFTP. 

Remark: Since (1) the coupling into and from the past algorithm accesses the random 
maps of the target chain going forwards in time, (2) these random maps are coupled ex 
post facto to the sample path of the chain, and (3) the sample path of the reference chain 
is generated going backwards in time, it is necessary to either store in memory the entire 
sample path of the reference chain, or else to regenerate portions of it as needed. A similar 
situation exists in Fill's algorithm, and [Fill (199"8|) describes how to store portions of the 



sample path so that not too much memory is used, yet so that not too much time is spent 
regenerating the path. 

Remark: A few years ago it was asserted that CFTP could not be used with the so-called 
non-uniformly ergodic Markov chains. However, a variety of algorithms based on coupling 
into and from the past (e.g. [Kendall (1998|) , [Kendall and M0ller (1999]) , and P^und and Wilson 



(1997[ )) do in fact generate perfectly random samples using non-uniformly ergodic Markov 
chains. One possible interpretation of this asserted impossibility result is that coupling into 
and from the past contains within it another idea. We are optimistic that there may be 
additional clever ideas in the area of perfect simulation. 

8. Read-once coupling into and from the past 

Since the random maps used in coupling into and from the past are not independent of one 
another, but rather are generated by a Markov process (the reference chain), condition 1 of 



X[0] := ReferenceChainRandomState() 
T:= 1 
repeat { 

SetRandomSeed(seedl[log2(T)]) 
for t := [T/2J + 1 to T 

X[t] := ReverseReferenceChain(X[t — 1]) 
Set := (portion of state space compatible with X[r]) 
for t := T downto 1 
if t is a power of 2 

SetRandomSeed(seed2[log2(t)]) 
ApplyTargetChainRandoniMapCoupledExPostFacto(X[t],X[t — l],Set) 
T:=2*T 
} until Singleton(Set) 
output ElenientContainedln(Set) 

Figure 8: Pseudocode for the binary-backoff implementation of CIAFTP. The states 
. . . , X_t, . . . , X_i, Xq of the reference Markov chain are generated going back in time and 
stored, and then read going forwards in time; p^'ill (1998D describes a way to store many fewer 



values without excessive recomputation. Since the reference and target Markov chains are 
coupled, knowledge of X[T] contains information about previous not-yet-generated random 
maps of the target chain, which may rule out some values of the state of the target Markov 
chain. The variable Set represents the image of the composition f-T°- ■ -^f-t when restricted 
to this set of possible values, or more generally a superset of the image, and is almost never 
a naive listing of the states. 

the main result is not satisfied. Therefore our main result does not imply that coupling into 
and from the past can be run with a read-once source of randomness. In this section we give 
a protocol for read-once coupling into and from the past. 

For read-once coupling into and from the past, as with read-once CFTP, it is convenient 
to work with a composite random map that has an approximately 1/2 chance of being 
coalescent. Of course now the composite random map takes as input in initial state for the 
reference Markov chain, and produces a final state for the reference Markov chain as well as 
a random map for the target Markov chain. The probability of coalescence is in general a 
function of the initial state X for the reference chain, but when this initial state is random 
(drawn from the steady state distribution vTref of the reference Markov chain), the composite 
random map that we will use has probability 1/2 of being coalescent. 

Imagine that we have a sequence of such composite maps ft defined for each integer 
time t. Let Ct be the indicator random variable which is 1 if the tth composite map ft is 
coalescent, and otherwise. In the case of read-once CFTP, the random variables Ct are 
i.i.d. All that we can say here is that the Ct form a stationary process, which is to say that 
we may shift their indices by one, and obtain an identically distributed process. 
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Let A be the set of states X of the reference Markov chain for which the composite 
map procedure, when given X as input, generates a map that is coalescent with probabihty 
at least 1/4. Markov's inequahty tells us that a random X (drawn from vTref) will with 
probability at least 1/3 be contained in the set A. Since the reference Markov chain is 
ergodic, the composite map procedure projected down to the state space of the reference 
Markov chain is also ergodic, so that regardless of its current state, if we wait long enough 
(where "long enough" may depend upon the current state), with probability at least 1/6 
the state will lie in the set A. The probability of the next composite map being coalescent 
is thus at least 1/24. So we see that with probability 1, there is some positive t such that 
Ct = I. Since the C^'s are stationary process, with probability 1 there is also some negative 
t such that Ct = 1. 

Let S be the smallest positive number such that C_s = 1, and let T be the smallest 
non- negative number such that Ct = 1- More generally, let Sk and T^ be the values that 
5* and T would assume if the indices of the Ct process were decremented by k. Thus the 
gap between the two coalescent composite maps straddling the state that time k is given 
by Sk + Tfc. By stationarity, each pair (S'fc,Tfc) is distributed in the same manner as {S,T). 
Consider these variables as we increase k. Unless T^ = 0, we have that Tj^^^i = T^ — 1 and 
Sk+i = Sk + l. If on the other hand T^ = 0, then T^+i takes on some new non- negative value 
and Sk+i = 1. By increasing k many times and averaging, we see that given S + T, the pair 
{S, T) is uniformly distributed amongst (legal) pairs whose sum is S" + T. In particular, S 
has the same distribution as T + 1. 

To determine the state of the target Markov chain at time 0, we can determine the 
smallest positive number S such that /_5 is coalescent, and output /_i o /_2 o ■ • ■ o f_s. 
Alternatively, we can randomly generated S from its appropriate distribution, then using 
new randomness, generate a sequence of S random composite maps conditioned so that the 
first is coalescent while the remaining ones are not, and return the composition of these 
maps. Since 5* has the same distribution as T + 1 above, we may generate random composite 
maps until one of them is coalescent, and set 5* to be the number of generated maps. To 
generate the sequence of S composite maps conditioned so that only the first of them (rather 
than the last of them) is coalescent, we can use rejection sampling. At no point do we need 
to reuse previously used randomness. Pseudocode for this procedure is given in Figure ^. 

Next we consider the expected running time of this read-once coupling into and from the 
past procedure. Let pk = Pt[S = k]. When RejectionSample(/c) it is called, the expected 
number of trials before one is accepted is l/pk- Since T + 1 and S are equidistributed, with 
probability pk ReadOnceCIAFTP() calls RejectionSample(A;). Thus the expected number 
times that RejectionSample() gets called is J2kPk ^ ^/Pk = oo, unless of course only finitely 
many p^s are nonzero. 

Despite the expected running time being infinite, since J2kPk = 1; the algorithm does 
terminate with probability 1. 

Remark: Upon reading a preliminary explanation of how to do read-once CFTP, Duncan 
Murdoch suggested a version that involves the doubling of starting times in the past used 
by the binary-backoff CFTP protocol. This version fairs poorly when compared with the 
read-once CFTP protocol given in Figure 0. But when the binary-backoff variation of read- 
once CFTP is adapted to couphng into and from the past, since the protocol in Figure ^ 
has infinite expected running time, there appears to be no reason to prefer either variation 
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Rej ectionSample (Length) 

X := ReferenceChainRandomStateO 
State := (arbitrary state compatible with X) 
RandomComposite (X , St ate , CoalescenceFlag) 
if not CoalescenceFlag 

return RejectionSaniple(Length) 
for Count: =2 to Length 

RandomComposite (X , St ate , CoalescenceFlag) 

if CoalescenceFlag 

return RejectionSample(Length) 
return State 

ReadOnceCIAFTPO 

X := ReferenceChainRandomStateO 

State := (arbitrary state compatible with X) 

Count := 

repeat 

RandomComposite(X, State, CoalescenceFlag) 
Count := Count+1 
until CoalescenceFlag 
return RejectionSample(Count) 

Figure 9: Pseudocode for read-once coupling into and from the past. Here the random 
composite map is assumed to be defined by an external procedure, since it is called from 
multiple locations. (Recall that the composite map was implicitly defined in the pseudocode 
for read-once coupling from the past, since it would have been called from only one location.) 
The Set2 argument to RandomComposite is optional. While this procedure does work, we 
don't recommend it, on account of its infinite expected running time. We leave as a open 
problem the task of finding a more reasonable (finite expected running time) read-once 
version of coupling into and from the past. 
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to the other. There are many other variations that work as well, but it is not clear whether 
or not there is a variation that has finite expected running time. 

9. Locally stable point processes 

The principal application to which coupling into and from the past has been applied are 
the locally stable point processes considered by [Kendall and M0ller (1999| ). While we don't 
have a general-purpose read-once coupling into and from the past protocol, or at least not 
one that runs in finite expected time, in this section we see how to apply read-once CFTP 
to generate samples from many locally stable point processes within a reasonable amount of 
time. 

To apply read-once CFTP we need to construct a suitable composite random map, which 
is coalescent fairly frequently, and which does not take any state information as input, such 
as the value of the dominating Markov chain, as the auxiliary state information would 
introduce dependencies between subsequent random maps. Since the natural Markov chain 
is typically not uniformly ergodic (which was the reason for introducing the dominating chain 
in the first place), we use Murdoch's technique of mixing the natural Markov chain with an 
independence sampler. 

Following Kendall and M0ller's notation, we let / denote the probability density function 
of the point process relative to a Poisson process with parameter A (where A is often constant 
throughout the region, but can in general vary within the region). The process is called 
"locally stable" if for some K adding a point to any given configuration increases its density 
by no more than a factor of K. The natural Markov chain that Kendall and M0ller (1999 ) 



use is the continuous-time birth-and-death chain in which points die with rate 1, and are 
proposed with rate K\. A proposed point x is born, i.e. added to the configuration a, with 
probability given by /(cr U {x})/[Kf{a)]. Since detailed balance is satisfied, this Markov 
chain has the desired stationary distribution. 

Assume for the time being that the density function is always positive, as it is for say 
the Strauss process. (For the Strauss process, f{a) = -y# pairs of pomts closer than r ^^^ some 
interaction radius r and parameter 7 < 1 ( [Strauss, 1975| ) ( [Kelly and Ripley, 1976[) .) Later 



we explain how to deal with point processes such as the impenetrable spheres model where 
the density function can be zero. 

Our representation for a set of configurations of the point process will consist of an 
integer k and two finite sets of points A and L. The set represented by {k, A, L) consists 
of all configurations that contain each point in L, possibly some of the points in A, and at 
most k additional points that could be anywhere. 

The first step of the composite map is to generate a Poisson point configuration with 
suitably high intensity parameter, such as 2KX, and use it as a proposal for a Metropolis- 
Hastings update. Say that the proposed state a has #cr points. For typical states of the 
point process, the proposal a will likely be rejected, but for all starting states with at least 
B{a) = 7^o- + #crlog2 -R' + log2 l//(o") points, the proposal will be accepted. After the update 
we know that the state has at most B{a) points, but the points could be anywhere. Thus 
the (super)set of possible configurations is represented by (-B(cr), 0, 0). 

Next we proceed to do updates according to the usual birth-and-death process. The k 
points at unknown locations each die with rate 1, so the integer in the representation is 
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decremented at rate k. Points are proposed with rate KX, which are then added to A. The 
points in A also die with rate 1. The set L remains empty. Eventually k = 0, implying 
that each of the original points at unknown locations has died, and so the Markov chain 
state must be a subset of A. Since we now have a finite upper bound and lower bound on 
the current state, we can continue to run the birth-and-death process and start updating 
the representation of possible configurations as described by [Kendall and M0ller (1999]) , i.e. 
using for instance monotone or anti-monotone coupling. Eventually A = 0, at which time 
the the Markov chain is guaranteed to be in state L. Let T be the amount of time that the 
birth-and-death process was run, i.e. the time between the Metropolis-Hastings update and 
the time that coalescence is achieved. We throw out the coalesced state and remember T. 

Then we do essentially the same thing again, except that we run the birth-and-death 
process for an amount of time equal to T rather than until the time that coalescence is 
achieved. We need to evaluate the random map at the input state as well as detect whether 
or not it is coalescent. Let a' be the Metropolis-Hastings proposal during this second round, 
and let Tom be the state after applying Metropolis-Hastings update to the input. When 
running the birth-and-death process during this second round, we maintain an integer j 
and finite sets of points Tom, Tncw, ^5 and L. All these sets (except Tom) are initialized to 
0. The current state resulting from the input state is r = Tom U Tncw When running the 
birth-and-death process, any point that is born into r is inserted into Tnew, but of course 
points in both sets die with rate 1. The integer k is given by k = j + #roid, so we initialize 
j to -B(cr') — #roid and decrement j at rate j. Thus we can detect coalescence in the same 
manner as we did in the first round, and whether or not coalescence is achieved the second 
time, we can determine the final state for any given starting state. As usual, the random 
map preserves the desired probability distribution since both the Metropolis-Hastings update 
and the birth-and-death process preserve the distribution, and because the birth-and-death 
process is run for an amount of time that is independent of the birth-and-death process itself. 
If coalescence is in fact officially achieved the second time, then the coalescence flag is set to 
be true; with probability 1/2 this flag is set to be true. Thus the composite map satisfles all 
the requisite properties to be used with read-once CFTP. 

This composite map procedure for locally stable point processes is one reason that in 
§ ^ we recommended doing only the flrst update using the independence sampler, and doing 
subsequent updates with the natural Markov chain. While it may be possible to use the 
independence sampler more frequently here, doing so would at the very least unnecessarily 
complicate the procedure. 



Figure 10 shows perfectly random samples drawn from the Strauss point process and the 
impenetrable spheres model, which were generated using the approach described here. 

Next we consider the distribution of the coalescence time T, which we can break apart 
as T = Ti + T2, where Ti is the time for all the points at unknown locations to die, and 
T2 is the remaining time for coalescence. At time Ti, the set A is stochastically dominated 



by a Poisson point process with intensity KX. Since Kendall and M0ller (1999|) coalesce 



all the states that are subsets of such a Poisson point process, the remaining time T2 until 
coalescence will be at most as large as the coalescence time for Kendall and M0ller. The time 
Ti will be about \ogB{a) + 0(1). One might object that the time it takes the procedure to 
decrement /c to is not proportional to Ti, but is instead proportional its initial value B{a). 
But since Ti distributed in the same manner as — log(l — f/^/^*^""-'), where U is uniformly 
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Figure 10: Perfectly random samples of the Strauss point process. In both panels the points 
have interaction radius 1 and occur in a 20 x 20 region with free boundary conditions. Two 
points interact, contributing a factor of 7 to the probability density, if the circles drawn 
around them overlap. On the left A = 2 and 7 = 1/2. On the right A = 1 and 7 = (the 
impenetrable spheres model). When 7 = 0, it is an immediate consequence of ( |Luby anB| 
[Vigoda, 1999|) that the coupling is rapid for A < 2/7r; in practice it is rapid for A < 1. 



distributed between and 1, we can optimize the procedure to avoid needless decrementing. 
Recall that B{a) = #0" + #crlog2 K + log2 1/ f{a) where a is a Poisson point process with 
intensity 2KX. Thus we can expect Ti to be fairly small unless f{(j) is extraordinaraly close 
to 0. As /(fj) -^ 0, the time Ti grows like loglogl//((T). 

This very weak dependence of the time Ti upon /(cr) allows us to run the algorithm even 
for point processes such as the impenetrable spheres model where the density function f{a) 
can be zero. What we do is "soften" the density, and work with a modified density. For 
the case of the impenetrable spheres model, rather than multiply the density function by 
for each pair of points that are too close together, we can instead multiply the density by 
say 10~^° for each such pair. We generate point configurations from this modified density, 
and then do rejection sampling to obtain a sample from the desired density. Since 10~^° is 
extremely close to 0, we almost never reject the sample, and since the run time dependence 
upon /(cr) is so weak, the run time is not adversely affected. 

In practice the time T2 is either on par with or else dwarfs the time Ti. Since the run 
time of the algorithm used by [Kendall and M0ller (1999| ) is only marginally larger than T2 , 
we expect the running time of their algorithm and ours to be similar, with smaller memory 
requirements for the algorithm described here. Our algorithm for the Strauss process also 
appears to be competitve with another algorithm given by |M0ller and NichoUs (1999| ). 
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10. Summary and open problems 

We have given the modification of the couphng from the past protocol which only requires 
a read once source of randomness. This read-once CFTP protocol is on par with the usual 
CFTP protocol in terms of memory and time, and for some applications will be up to 
logarithmically faster. Read-once CFTP is closely related to the PASTA property from 
operations research. We have also given a read-once version of the coupling into and from 
the past protocol, but it is unsatisfactory since the expected running time is infinite. We leave 
as open problems the existence of a better read-once version of CIAFTP, and the existence 
of further connections between PASTA-type thereoms and perfect sampling algorithms. 

Source code 

' ■ "■ "'^ is 



The source code for the program used to make the Strauss process samples in Figure |T0 
available at http://dbwilson.coin/strauss/. 
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